Error using read_sql_table

Hello community, I have a problem with a certain block of code. When I execute the following block of code:

ddf = dd.read_sql_table(‘gasto_mensual_replica’,
schema=schema,
con=con_str,
index_col=‘id’,
npartitions=npartitions,
columns=[‘categoria_gasto’,‘ano_eje’,‘nivel_gobierno_nombre’,‘pliego’,‘sec_ejec’,‘ejecutora’,‘division_funcional’,‘grupo_funcional’],
meta=meta)

I don’t have any problems, meaning it runs successfully with the information provided in the ‘columns’ parameter, which contains a list of certain column names from my table.

Now, I understand that if I want to read all the columns of my table, I should remove the “columns” parameter from my read_sql_table function. When I do that, the code looks like this:

ddf = dd.read_sql_table(‘gasto_mensual_replica’,
schema=schema,
con=con_str,
index_col=‘id’,
npartitions=npartitions,
meta=meta)

I get a large error where the crucial part is the following:

DatabaseError: (cx_Oracle.DatabaseError) ORA-00904: “ANON_1”.“FUENTE_FINANCIAMIENTO_NO_6”: invalid identifier

I understand that it’s an error indicating that the column “ANON_1”.“FUENTE_FINANCIAMIENTO_NO_6” was not found. However, my question is, why does it inform me of this if I don’t have any column named like that in my “meta” object or in my “gasto_mensual_replica” table? I’ve been reading that it’s possible that an alias is generated for better table reading as part of the sqlalchemy process, but I don’t see why aliases would be generated.

Thank you very much for your responses.

Hi @Rhym2024, welcome to Dask Discourse forum!

Well, I have to admit that it’s hard to tell from where this could come. Could you copy and past the full stack trace?

Do you have a column named like that somewhere in your database? What happens when you remove the meta kwarg?

Hi, thanks for your answer!

The full stack trace is (with or without the meta parameter):


DatabaseError Traceback (most recent call last)
File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\sqlalchemy\engine\base.py:1971, in Connection._exec_single_context(self, dialect, context, statement, parameters)
1970 if not evt_handled:
→ 1971 self.dialect.do_execute(
1972 cursor, str_statement, effective_parameters, context
1973 )
1975 if self._has_events or self.engine._has_events:

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\sqlalchemy\engine\default.py:919, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
918 def do_execute(self, cursor, statement, parameters, context=None):
→ 919 cursor.execute(statement, parameters)

DatabaseError: ORA-00904: “ANON_1”.“FUENTE_FINANCIAMIENTO_NO_6”: invalid identifier

The above exception was the direct cause of the following exception:

DatabaseError Traceback (most recent call last)
Cell In[21], line 1
----> 1 ddf = dd.read_sql_table(‘gasto_mensual_replica’,
2 schema=schema,
3 con=con_str,
4 index_col=‘id’,
5 npartitions=npartitions)

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\dask_expr\io\sql.py:28, in read_sql_table(table_name, con, index_col, divisions, npartitions, limits, columns, bytes_per_chunk, head_rows, schema, meta, engine_kwargs, **kwargs)
11 def read_sql_table(
12 table_name,
13 con,
(…)
24 **kwargs,
25 ):
26 from dask.dataframe.io.sql import read_sql_table as _read_sql_table
—> 28 df = _read_sql_table(
29 table_name,
30 con,
31 index_col,
32 divisions=divisions,
33 npartitions=npartitions,
34 limits=limits,
35 columns=columns,
36 bytes_per_chunk=bytes_per_chunk,
37 head_rows=head_rows,
38 schema=schema,
39 meta=meta,
40 engine_kwargs=engine_kwargs,
41 **kwargs,
42 )
43 return from_dask_dataframe(df)

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\dask\dataframe\io\sql.py:345, in read_sql_table(table_name, con, index_col, divisions, npartitions, limits, columns, bytes_per_chunk, head_rows, schema, meta, engine_kwargs, **kwargs)
341 columns.append(index)
343 query = sql.select(*columns).select_from(table_name)
→ 345 return read_sql_query(
346 sql=query,
347 con=con,
348 index_col=index,
349 divisions=divisions,
350 npartitions=npartitions,
351 limits=limits,
352 bytes_per_chunk=bytes_per_chunk,
353 head_rows=head_rows,
354 meta=meta,
355 engine_kwargs=engine_kwargs,
356 **kwargs,
357 )

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\dask\dataframe\io\sql.py:123, in read_sql_query(sql, con, index_col, divisions, npartitions, limits, bytes_per_chunk, head_rows, meta, engine_kwargs, **kwargs)
120 if head_rows > 0:
121 # derive metadata from first few rows
122 q = sql.limit(head_rows)
→ 123 head = pd.read_sql(q, engine, **kwargs)
125 if len(head) == 0:
126 # no results at all
127 return from_pandas(head, npartitions=1)

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\pandas\io\sql.py:734, in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize, dtype_backend, dtype)
724 return pandas_sql.read_table(
725 sql,
726 index_col=index_col,
(…)
731 dtype_backend=dtype_backend,
732 )
733 else:
→ 734 return pandas_sql.read_query(
735 sql,
736 index_col=index_col,
737 params=params,
738 coerce_float=coerce_float,
739 parse_dates=parse_dates,
740 chunksize=chunksize,
741 dtype_backend=dtype_backend,
742 dtype=dtype,
743 )

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\pandas\io\sql.py:1836, in SQLDatabase.read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize, dtype, dtype_backend)
1779 def read_query(
1780 self,
1781 sql: str,
(…)
1788 dtype_backend: DtypeBackend | Literal[“numpy”] = “numpy”,
1789 ) → DataFrame | Iterator[DataFrame]:
1790 “”"
1791 Read SQL query into a DataFrame.
1792
(…)
1834
1835 “”"
→ 1836 result = self.execute(sql, params)
1837 columns = result.keys()
1839 if chunksize is not None:

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\pandas\io\sql.py:1660, in SQLDatabase.execute(self, sql, params)
1658 if isinstance(sql, str):
1659 return self.con.exec_driver_sql(sql, *args)
→ 1660 return self.con.execute(sql, *args)

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\sqlalchemy\engine\base.py:1422, in Connection.execute(self, statement, parameters, execution_options)
1420 raise exc.ObjectNotExecutableError(statement) from err
1421 else:
→ 1422 return meth(
1423 self,
1424 distilled_parameters,
1425 execution_options or NO_OPTIONS,
1426 )

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\sqlalchemy\sql\elements.py:514, in ClauseElement._execute_on_connection(self, connection, distilled_params, execution_options)
512 if TYPE_CHECKING:
513 assert isinstance(self, Executable)
→ 514 return connection._execute_clauseelement(
515 self, distilled_params, execution_options
516 )
517 else:
518 raise exc.ObjectNotExecutableError(self)

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\sqlalchemy\engine\base.py:1644, in Connection._execute_clauseelement(self, elem, distilled_parameters, execution_options)
1632 compiled_cache: Optional[CompiledCacheType] = execution_options.get(
1633 “compiled_cache”, self.engine._compiled_cache
1634 )
1636 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache(
1637 dialect=dialect,
1638 compiled_cache=compiled_cache,
(…)
1642 linting=self.dialect.compiler_linting | compiler.WARN_LINTING,
1643 )
→ 1644 ret = self._execute_context(
1645 dialect,
1646 dialect.execution_ctx_cls._init_compiled,
1647 compiled_sql,
1648 distilled_parameters,
1649 execution_options,
1650 compiled_sql,
1651 distilled_parameters,
1652 elem,
1653 extracted_params,
1654 cache_hit=cache_hit,
1655 )
1656 if has_events:
1657 self.dispatch.after_execute(
1658 self,
1659 elem,
(…)
1663 ret,
1664 )

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\sqlalchemy\engine\base.py:1850, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
1848 return self._exec_insertmany_context(dialect, context)
1849 else:
→ 1850 return self._exec_single_context(
1851 dialect, context, statement, parameters
1852 )

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\sqlalchemy\engine\base.py:1990, in Connection._exec_single_context(self, dialect, context, statement, parameters)
1987 result = context._setup_result_proxy()
1989 except BaseException as e:
→ 1990 self._handle_dbapi_exception(
1991 e, str_statement, effective_parameters, cursor, context
1992 )
1994 return result

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\sqlalchemy\engine\base.py:2357, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context, is_sub_exec)
2355 elif should_wrap:
2356 assert sqlalchemy_exception is not None
→ 2357 raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
2358 else:
2359 assert exc_info[1] is not None

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\sqlalchemy\engine\base.py:1971, in Connection._exec_single_context(self, dialect, context, statement, parameters)
1969 break
1970 if not evt_handled:
→ 1971 self.dialect.do_execute(
1972 cursor, str_statement, effective_parameters, context
1973 )
1975 if self._has_events or self.engine._has_events:
1976 self.dispatch.after_cursor_execute(
1977 self,
1978 cursor,
(…)
1982 context.executemany,
1983 )

File C:\ProgramData\anaconda3\envs\ee3110\Lib\site-packages\sqlalchemy\engine\default.py:919, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
918 def do_execute(self, cursor, statement, parameters, context=None):
→ 919 cursor.execute(statement, parameters)

DatabaseError: (cx_Oracle.DatabaseError) ORA-00904: “ANON_1”.“FUENTE_FINANCIAMIENTO_NO_6”: invalid identifier
[SQL: SELECT anon_1.ano_eje, anon_1.mes_eje, anon_1.nivel_gobierno, anon_1.nivel_gobierno_nombre, anon_1.sector, anon_1.sector_nombre, anon_1.pliego, anon_1.pliego_nombre, anon_1.sec_ejec, anon_1.ejecutora, anon_1.ejecutora_nombre, anon_1.departamento_ejecutora, anon_1.departamento_ejecutora_n_1, anon_1.provincia_ejecutora, anon_1.provincia_ejecutora_nomb_2, anon_1.distrito_ejecutora, anon_1.distrito_ejecutora_nombr_3, anon_1.sec_func, anon_1.programa_ppto, anon_1.programa_ppto_nombre, anon_1.tipo_act_proy, anon_1.tipo_act_proy_nombre, anon_1.producto_proyecto, anon_1.producto_proyecto_nombre, anon_1.actividad_accion_obra, anon_1.actividad_accion_obra_no_4, anon_1.funcion, anon_1.funcion_nombre, anon_1.division_funcional, anon_1.division_funcional_nombr_5, anon_1.grupo_funcional, anon_1.grupo_funcional_nombre, anon_1.meta, anon_1.finalidad, anon_1.meta_nombre, anon_1.departamento_meta, anon_1.departamento_meta_nombre, anon_1.fuente_financiamiento, anon_1.fuente_financiamiento_no_6, anon_1.rubro, anon_1.rubro_nombre, anon_1.tipo_recurso, anon_1.tipo_recurso_nombre, anon_1.categoria_gasto, anon_1.categoria_gasto_nombre, anon_1.tipo_transaccion, anon_1.generica, anon_1.generica_nombre, anon_1.subgenerica, anon_1.subgenerica_nombre, anon_1.subgenerica_det, anon_1.subgenerica_det_nombre, anon_1.especifica, anon_1.especifica_nombre, anon_1.especifica_det, anon_1.especifica_det_nombre, anon_1.monto_pia, anon_1.monto_pim, anon_1.monto_certificado, anon_1.monto_comprometido_anual, anon_1.monto_comprometido, anon_1.monto_devengado, anon_1.monto_girado, anon_1.id
FROM (SELECT ano_eje, mes_eje, nivel_gobierno, nivel_gobierno_nombre, sector, sector_nombre, pliego, pliego_nombre, sec_ejec, ejecutora, ejecutora_nombre, departamento_ejecutora, departamento_ejecutora_nombre, provincia_ejecutora, provincia_ejecutora_nombre, distrito_ejecutora, distrito_ejecutora_nombre, sec_func, programa_ppto, programa_ppto_nombre, tipo_act_proy, tipo_act_proy_nombre, producto_proyecto, producto_proyecto_nombre, actividad_accion_obra, actividad_accion_obra_nombre, funcion, funcion_nombre, division_funcional, division_funcional_nombre, grupo_funcional, grupo_funcional_nombre, meta, finalidad, meta_nombre, departamento_meta, departamento_meta_nombre, fuente_financiamiento, fuente_financiamiento_nombre, rubro, rubro_nombre, tipo_recurso, tipo_recurso_nombre, categoria_gasto, categoria_gasto_nombre, tipo_transaccion, generica, generica_nombre, subgenerica, subgenerica_nombre, subgenerica_det, subgenerica_det_nombre, especifica, especifica_nombre, especifica_det, especifica_det_nombre, monto_pia, monto_pim, monto_certificado, monto_comprometido_anual, monto_comprometido, monto_devengado, monto_girado, id
FROM “CONSULTORDAS”.gasto_mensual_replica) anon_1
WHERE ROWNUM <= 5]
(Background on this error at: Error Messages — SQLAlchemy 2.0 Documentation)


This is my meta object:

from sqlalchemy.dialects.oracle import NUMBER, VARCHAR2, DATE
from sqlalchemy import Integer, String, Date, Numeric, Float
import numpy as np

sqlalchemy_to_pandas_type_map = {
Integer: np.int64,
String: ‘object’,
Date: ‘datetime64[ns]’,
Numeric: ‘float64’,
Float: ‘float64’,
NUMBER: ‘float64’,
VARCHAR2: ‘object’,
DATE: ‘datetime64[ns]’

}

column_types = {}
for column in table.columns:
pandas_dtype = sqlalchemy_to_pandas_type_map.get(type(column.type), ‘object’)
column_types[column.name] = pandas_dtype

meta = pd.DataFrame({name: pd.Series(dtype=dtype) for name, dtype in column_types.items()})
meta = meta.iloc[:, :-1]


and these are the columns of my meta object:

[‘ano_eje’, ‘mes_eje’, ‘nivel_gobierno’, ‘nivel_gobierno_nombre’, ‘sector’, ‘sector_nombre’, ‘pliego’, ‘pliego_nombre’, ‘sec_ejec’, ‘ejecutora’, ‘ejecutora_nombre’, ‘departamento_ejecutora’, ‘departamento_ejecutora_nombre’, ‘provincia_ejecutora’, ‘provincia_ejecutora_nombre’, ‘distrito_ejecutora’, ‘distrito_ejecutora_nombre’, ‘sec_func’, ‘programa_ppto’, ‘programa_ppto_nombre’, ‘tipo_act_proy’, ‘tipo_act_proy_nombre’, ‘producto_proyecto’, ‘producto_proyecto_nombre’, ‘actividad_accion_obra’, ‘actividad_accion_obra_nombre’, ‘funcion’, ‘funcion_nombre’, ‘division_funcional’, ‘division_funcional_nombre’, ‘grupo_funcional’, ‘grupo_funcional_nombre’, ‘meta’, ‘finalidad’, ‘meta_nombre’, ‘departamento_meta’, ‘departamento_meta_nombre’, ‘fuente_financiamiento’, ‘fuente_financiamiento_nombre’, ‘rubro’, ‘rubro_nombre’, ‘tipo_recurso’, ‘tipo_recurso_nombre’, ‘categoria_gasto’, ‘categoria_gasto_nombre’, ‘tipo_transaccion’, ‘generica’, ‘generica_nombre’, ‘subgenerica’, ‘subgenerica_nombre’, ‘subgenerica_det’, ‘subgenerica_det_nombre’, ‘especifica’, ‘especifica_nombre’, ‘especifica_det’, ‘especifica_det_nombre’, ‘monto_pia’, ‘monto_pim’, ‘monto_certificado’, ‘monto_comprometido_anual’, ‘monto_comprometido’, ‘monto_devengado’, ‘monto_girado’]


the column name of the table “gasto_mensual_replica”, are the same than “meta”.

The only difference in the output of my query for the column names of the table is that in the source database, all of these are in UPPERCASE.

I don’t know where the request to read the column ANON_1.FUENTE_FINANCIAMIENTO_NO_6 is coming from, if it doesn’t exist in my table.

dask : 2024.3.1
sqlalchemy : 2.0.29

I’m not sure why, but it seems some of the column names that have a length > 25 are stripped: departamento_ejecutora_n_1, provincia_ejecutora_nomb_2, distrito_ejecutora_nombr_3, distrito_ejecutora_nombr_3, division_funcional_nombr_5, fuente_financiamiento_no_6.

I really don’t know from where it comes, and also not sure why there are two imbricated SELECT statements. What I would do first is trya select query with just SQLAlchemy, then with Pandas, to be sure that it comes from how Dask builds its query without a column argument.