Skip to content

Latest commit

 

History

History
232 lines (189 loc) · 7.49 KB

File metadata and controls

232 lines (189 loc) · 7.49 KB
comments true
difficulty Hard
edit_url https://github.com/doocs/leetcode/edit/main/solution/3800-3899/3832.Find%20Users%20with%20Persistent%20Behavior%20Patterns/README_EN.md
tags
Database

中文文档

Description

Table: activity

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| user_id      | int     |
| action_date  | date    |
| action       | varchar |
+--------------+---------+
(user_id, action_date, action) is the primary key (unique value) for this table.
Each row represents a user performing a specific action on a given date.

Write a solution to identify behaviorally stable users based on the following definition:

  • A user is considered behaviorally stable if there exists a sequence of at least 5 consecutive days such that:
    <ul>
    	<li>The user performed <strong>exactly one action per day</strong> during that period.</li>
    	<li>The <strong>action is the same</strong> on all those consecutive days.</li>
    </ul>
    </li>
    <li>If a user has multiple qualifying sequences, only consider the sequence with the <strong>maximum length</strong>.</li>
    

Return the result table ordered by streak_length in descending order, then by user_id in ascending order.

The result format is in the following example.

 

Example:

Input:

activity table:

+---------+-------------+--------+
| user_id | action_date | action |
+---------+-------------+--------+
| 1       | 2024-01-01  | login  |
| 1       | 2024-01-02  | login  |
| 1       | 2024-01-03  | login  |
| 1       | 2024-01-04  | login  |
| 1       | 2024-01-05  | login  |
| 1       | 2024-01-06  | logout |
| 2       | 2024-01-01  | click  |
| 2       | 2024-01-02  | click  |
| 2       | 2024-01-03  | click  |
| 2       | 2024-01-04  | click  |
| 3       | 2024-01-01  | view   |
| 3       | 2024-01-02  | view   |
| 3       | 2024-01-03  | view   |
| 3       | 2024-01-04  | view   |
| 3       | 2024-01-05  | view   |
| 3       | 2024-01-06  | view   |
| 3       | 2024-01-07  | view   |
+---------+-------------+--------+

Output:

+---------+--------+---------------+------------+------------+
| user_id | action | streak_length | start_date | end_date   |
+---------+--------+---------------+------------+------------+
| 3       | view   | 7             | 2024-01-01 | 2024-01-07 |
| 1       | login  | 5             | 2024-01-01 | 2024-01-05 |
+---------+--------+---------------+------------+------------+

Explanation:

  • User 1:
    <ul>
    	<li>Performed <code>login</code> from 2024-01-01 to 2024-01-05 on consecutive days</li>
    	<li>Each day has exactly one action, and the action is the same</li>
    	<li>Streak length = 5 (meets minimum requirement)</li>
    	<li>The action changes on 2024-01-06, ending the streak</li>
    </ul>
    </li>
    <li><strong>User 2</strong>:
    <ul>
    	<li>Performed <code>click</code> for only 4 consecutive days</li>
    	<li>Does not meet the minimum streak length of 5</li>
    	<li>Excluded from the result</li>
    </ul>
    </li>
    <li><strong>User 3</strong>:
    <ul>
    	<li>Performed <code>view</code> for 7 consecutive days</li>
    	<li>This is the longest valid sequence for this user</li>
    	<li>Included in the result</li>
    </ul>
    </li>
    

The Results table is ordered by streak_length in descending order, then by user_id in ascending order

Solutions

Solution 1: Filtering + Grouping + Aggregation

We first need to filter user dates with only a single action per day, then identify consecutive intervals among these dates, and finally aggregate these intervals to calculate the streak length and filter records that meet the criteria.

MySQL

# Write your MySQL query statement below
WITH
    daily_counts AS (
        -- Step 1: Filter user dates with exactly one record per day (meeting the requirement of "exactly one action per day")
        SELECT
            user_id,
            action_date,
            action,
            COUNT(*) OVER (PARTITION BY user_id, action_date) AS cnt
        FROM activity
    ),
    filtered_activity AS (
        -- Step 2: Filter out data with multiple actions on the same day
        SELECT user_id, action_date, action
        FROM daily_counts
        WHERE cnt = 1
    ),
    streak_groups AS (
        -- Step 3: Group consecutive dates using the method of subtracting row number from date
        SELECT
            user_id,
            action,
            action_date,
            DATE_SUB(
                action_date,
                INTERVAL ROW_NUMBER() OVER (
                    PARTITION BY user_id, action
                    ORDER BY action_date
                ) DAY
            ) AS grp
        FROM filtered_activity
    ),
    streak_summary AS (
        -- Step 4: Calculate the length of each consecutive segment and only keep records with length >= 5
        SELECT
            user_id,
            action,
            COUNT(*) AS streak_length,
            MIN(action_date) AS start_date,
            MAX(action_date) AS end_date,
            -- Sort different streaks for each user to facilitate getting the maximum value later
            ROW_NUMBER() OVER (
                PARTITION BY user_id
                ORDER BY COUNT(*) DESC
            ) AS rnk
        FROM streak_groups
        GROUP BY user_id, action, grp
        HAVING streak_length >= 5
    )
-- Step 5: Extract the longest record for each qualified user and sort
SELECT user_id, action, streak_length, start_date, end_date
FROM streak_summary
WHERE rnk = 1
ORDER BY streak_length DESC, user_id ASC;

Pandas

import pandas as pd

def find_behaviorally_stable_users(activity: pd.DataFrame) -> pd.DataFrame:
    activity['action_date'] = pd.to_datetime(activity['action_date'])

    # Filter users with only a single action per day
    df = activity.assign(cnt=activity.groupby(['user_id', 'action_date'])['action'].transform('count'))
    df = df[df['cnt'] == 1].sort_values(['user_id', 'action', 'action_date'])

    # Identify consecutive intervals
    df['rn'] = df.groupby(['user_id', 'action'])['action_date'].rank(method='first')
    df['grp'] = df['action_date'] - pd.to_timedelta(df['rn'], unit='D')

    # Aggregate streaks
    streaks = df.groupby(['user_id', 'action', 'grp']).agg(
        streak_length=('action_date', 'count'),
        start_date=('action_date', 'min'),
        end_date=('action_date', 'max')
    ).reset_index()

    # Filter and get the longest streak for each user
    res = streaks[streaks['streak_length'] >= 5].sort_values(
        ['streak_length', 'user_id'], ascending=[False, True]
    )

    return res.groupby('user_id').head(1)[['user_id', 'action', 'streak_length', 'start_date', 'end_date']]