Skip to content

Commit 7a09e80

Browse files
committed
Recursively create tablespace directories if they do not exist but we need them when re-redoing some tablespace related xlogs (e.g. database create with a tablespace) on mirror.
It is observed many time that gp_replica_check test fails because some mirror nodes can not be brought up before testing recently. The related log looks like this: 2019-04-17 14:52:14.951 CST [23030] FATAL: could not create directory "pg_tblspc/65546/PG_12_201904072/65547": No such file or directory 2019-04-17 14:52:14.951 CST [23030] CONTEXT: WAL redo at 0/3011650 for Database/CREATE: copy dir 1663/1 to 65546/65547 That is because some mirror nodes can not be recovered after previous testing, not due to gp_replica_check itself. The root cause is that tablespace recovery related. Pengzhou Tang and Hao Wu digged that intially and kindly found a mini repro as below. run on shell: rm -rf /tmp/some_isolation2_pg_basebackup_tablespace mkdir -p /tmp/some_isolation2_pg_basebackup_tablespace copy and run the below sql on psql client: drop tablespace if exists some_isolation2_pg_basebackup_tablespace; create tablespace some_isolation2_pg_basebackup_tablespace location '/tmp/some_isolation2_pg_basebackup_tablespace'; \!gpstop -ra -M fast; drop database if exists some_database_with_tablespace; create database some_database_with_tablespace tablespace some_isolation2_pg_basebackup_tablespace; drop database some_database_with_tablespace; drop tablespace some_isolation2_pg_basebackup_tablespace; \!gpstop -ra -M immediate; The root cause is on mirror after drop database & drop tablespace, 'immediate' stop causes the pg_control file not up-to-date with latest redo start lsn (this is allowed), when the node restarts, it re-redoes 'create database some_database_with_tablespace tablespace some_isolation2_pg_basebackup_tablespace' but the tablespace directories have been deleted in previous redoing. The 'could not create directory' error could happen on re-redoing create table in a tablespace also. We've seen this case on the ci environment, but that is because missing of a get_parent_directory() call in the 'create two parents' code block in TablespaceCreateDbspace(). Changing it to a simpler call pg_mkdir_p() instead. Also it seems that the src_path could be missing also in dbase_redo() for the example below. For example re-redoing at the alter step since tbs1 directory is deleted in later 'drop tablespace tbs1'. alter database db1 set tablespace tbs2; drop tablespace tbs1; There is discussion on upstream about this, https://www.postgresql.org/message-id/flat/CAEET0ZGx9AvioViLf7nbR_8tH9-%3D27DN5xWJ2P9-ROH16e4JUA%40mail.gmail.com In this patch I recreate those directories to avoid this error. Other solutions include ignoring the directory-not-existing error or forcing a flush when redoing those kind of checkpoint xlogs which are added normally in drop database, etc. Let's revert or update the code change after the solution is finalized on upstream.
1 parent 433a6eb commit 7a09e80

2 files changed

Lines changed: 26 additions & 25 deletions

File tree

src/backend/commands/dbcommands.c

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2187,6 +2187,7 @@ dbase_redo(XLogRecPtr beginLoc __attribute__((unused)), XLogRecPtr lsn __attri
21872187
xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *) XLogRecGetData(record);
21882188
char *src_path;
21892189
char *dst_path;
2190+
char *parentdir;
21902191
struct stat st;
21912192

21922193
src_path = GetDatabasePath(xlrec->src_db_id, xlrec->src_tablespace_id);
@@ -2206,6 +2207,30 @@ dbase_redo(XLogRecPtr beginLoc __attribute__((unused)), XLogRecPtr lsn __attri
22062207
dst_path)));
22072208
}
22082209

2210+
/*
2211+
* It is possible that the tablespace was later dropped, but we are
2212+
* re-redoing database create before that. In that case,
2213+
* either src_path or dst_path is probably missing here and needs to
2214+
* be created. We create directories here so that copy_dir() won't
2215+
* fail, but do not bother to create the symlink under pg_tblspc
2216+
* if the tablespace is not global/default.
2217+
*/
2218+
if (stat(src_path, &st) != 0 && pg_mkdir_p(src_path, S_IRWXU) != 0)
2219+
{
2220+
ereport(WARNING,
2221+
(errmsg("can not recursively create directory \"%s\"",
2222+
src_path)));
2223+
}
2224+
parentdir = pstrdup(dst_path);
2225+
get_parent_directory(parentdir);
2226+
if (stat(parentdir, &st) != 0 && pg_mkdir_p(parentdir, S_IRWXU) != 0)
2227+
{
2228+
ereport(WARNING,
2229+
(errmsg("can not recursively create directory \"%s\"",
2230+
parentdir)));
2231+
}
2232+
pfree(parentdir);
2233+
22092234
/*
22102235
* Force dirty buffers out to disk, to ensure source database is
22112236
* up-to-date for the copy.

src/backend/commands/tablespace.c

Lines changed: 1 addition & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -171,8 +171,6 @@ TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo)
171171
/* Directory creation failed? */
172172
if (mkdir(dir, S_IRWXU) < 0)
173173
{
174-
char *parentdir;
175-
176174
/* Failure other than not exists or not in WAL replay? */
177175
if (errno != ENOENT || !isRedo)
178176
ereport(ERROR,
@@ -186,30 +184,8 @@ TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo)
186184
* than a symlink.
187185
*/
188186

189-
/* create two parents up if not exist */
190-
parentdir = pstrdup(dir);
191-
get_parent_directory(parentdir);
192-
/* Can't create parent and it doesn't already exist? */
193-
if (mkdir(parentdir, S_IRWXU) < 0 && errno != EEXIST)
194-
ereport(ERROR,
195-
(errcode_for_file_access(),
196-
errmsg("could not create directory \"%s\": %m",
197-
parentdir)));
198-
pfree(parentdir);
199-
200-
/* create one parent up if not exist */
201-
parentdir = pstrdup(dir);
202-
get_parent_directory(parentdir);
203-
/* Can't create parent and it doesn't already exist? */
204-
if (mkdir(parentdir, S_IRWXU) < 0 && errno != EEXIST)
205-
ereport(ERROR,
206-
(errcode_for_file_access(),
207-
errmsg("could not create directory \"%s\": %m",
208-
parentdir)));
209-
pfree(parentdir);
210-
211187
/* Create database directory */
212-
if (mkdir(dir, S_IRWXU) < 0)
188+
if (pg_mkdir_p(dir, S_IRWXU) < 0)
213189
ereport(ERROR,
214190
(errcode_for_file_access(),
215191
errmsg("could not create directory \"%s\": %m",

0 commit comments

Comments
 (0)