Environment details
- OS type and version: Debian 10 (dataproc image 2.0-debian10)
- Python version:
python --version: Python 3.8.10
- pip version:
pip --version: pip 21.1.2
google-cloud-bigquery version: pip show google-cloud-bigquery: google-cloud-bigquery==2.6.2, pyarrow==2.0.0
Steps to reproduce
- Create a big dataframe (1000 lines) with a column containing a list (at least length 6) of identically structured dictionaries
- Create a bq client and use load_table_from_dataframe to create a table in bigquery
- Check the resulting table in bigquery. Structures seem to switch values with other instances in the list. (eg should have [STRUCT('w0' AS name, 0.1 AS value),STRUCT('h1' AS name, 1.2 AS value)] but have [STRUCT('h1' AS name, 0.1 AS value),STRUCT('w0' AS name, 1.2 AS value)]. The big problem is not the order, is that the integrity of information of each structure is not kept (eg. 'w0' should be 0.1, not 1.2).
Code example
# create df with a list of dictionaries
# In this example, the dict structure is {"name": str, "value":float}. name is a letter + int, and value are increasingly big floats
data = [[[{'name':'whyist'[i]+str(i), 'value':np.random.random()*10**i} for i in range(6)]] for n in range(1000)]
df = pd.DataFrame(data, columns=['vals'])
# load
project = 'myproject'
bq_client = bigquery.Client(project=project)
job_config = bigquery.LoadJobConfig()
job_config.write_disposition = 'WRITE_TRUNCATE'
bq_client.load_table_from_dataframe(
dataframe=df,
destination='tmp.test_bug',
job_config = job_config
)
# Checking in bigquery,
At least for this example, the 'value' attribute is transcribed in the correct order (first item has the smallest value, and it increases). The 'name' value was sampled with possibility of repetition. All table lines have the same 'name' values in the same order, and it can change if the code is reexecuted.
Environment details
python --version: Python 3.8.10pip --version: pip 21.1.2google-cloud-bigqueryversion:pip show google-cloud-bigquery: google-cloud-bigquery==2.6.2, pyarrow==2.0.0Steps to reproduce
Code example
At least for this example, the 'value' attribute is transcribed in the correct order (first item has the smallest value, and it increases). The 'name' value was sampled with possibility of repetition. All table lines have the same 'name' values in the same order, and it can change if the code is reexecuted.